Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nexus: use all CockroachDB hosts from DNS to create DB connection URL. #3783

Merged
merged 1 commit into from
Jul 31, 2023

Conversation

luqmana
Copy link
Contributor

@luqmana luqmana commented Jul 27, 2023

First pass at #3763 for crdb.

Even though we did query internal DNS, we were previously using only a single host as part of connecting to crdb from Nexus. And since the internal DNS server always returns records in the same order, that meant every Nexus instance was always using the same CockroachDB instance even now that we've been provisioning multiple. This also meant if that CRDB instance went down we'd be hosed (as seen in #3763).

To help with that, this PR changes Nexus to use all the CRDB hosts reported via Internal DNS when creating the connection URL. There are some comments in the code, but this still not quite as robust as we could be, but short of something cueball-like it's still an improvement.

To test I disabled the initial crdb nexus connected to and it was able to recover by connecting to the next crdb instance and continue serving requests. From the log we can see a successful query, connection errors once i disabled fd00:1122:3344:101::5, and then a successful query with connection reestablished to next crdb instance (fd00:1122:3344:101::3):

23:43:24.729Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): authorize result
    action = Query
    actor = Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000003, .. })
    resource = Database
    result = Ok(())
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.729Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.730Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:24.730Z ERRO 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): database connection error
    database_url = postgresql://root@[fd00:1122:3344:101::5]:32221,[fd00:1122:3344:101::3]:32221,[fd00:1122:3344:101::6]:32221,[fd00:1122:3344:101::4]:32221,[fd00:1122:3344:101::7]:32221/omicron?sslmode=d
isable
    error_message = Connection error: server is shutting down
23:43:30.803Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): roles
    roles = RoleSet { roles: {} }
23:43:30.804Z DEBG 7be03b0d-48bf-4f43-a11e-7303236a3c5e (ServerContext): authorize result
    action = Query
    actor = Some(Actor::UserBuiltin { user_builtin_id: 001de000-05e4-4000-8000-000000000003, .. })
    resource = Database
    result = Ok(())

@luqmana luqmana force-pushed the luqmana/crdb-all-the-hosts branch from be1b1b1 to 14365ae Compare July 28, 2023 00:06
sled-agent/src/services.rs Outdated Show resolved Hide resolved
nexus/src/context.rs Show resolved Hide resolved
sled-agent/src/services.rs Outdated Show resolved Hide resolved
@luqmana luqmana force-pushed the luqmana/crdb-all-the-hosts branch from 14365ae to a1fc081 Compare July 31, 2023 19:28
@luqmana luqmana merged commit a39a1a9 into main Jul 31, 2023
20 checks passed
@luqmana luqmana deleted the luqmana/crdb-all-the-hosts branch July 31, 2023 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants